Search Query Understanding with LLMs: From Ideation to Production
Ali Rokni • Location: Theater 5 • Back to Haystack 2024
“Understanding user intent from search queries is a critical challenge for providing relevant search results. Yelp has leveraged Large Language Models (LLMs) to enhance user query understanding, marking a significant shift away from more traditional techniques. Throughout this talk, we will present our journey from initial ideation to the full-scale production deployment of LLMs for various query understanding tasks - spelling correction, segmentation, canonicalization, expansion, and highlighting. Key factors that make the query understanding a potentially strong use case for leveraging LLMs include a query-focused task and low volumes of text for processing. The transition to LLMs from previously fragmented systems has added significant intelligence and greatly improved user experience within our search functionality.
The process consists of several stages, beginning with ideation and formulation—assessing the suitability of the LLM and defining the scope of the task. In addition, forms of Retrieval Augmented Generation (RAG) were explored to potentially enhance decision-making capabilities of the model, by incorporating additional information beyond the query text. Given the distribution of query frequencies, we designed proof of concept systems that cached responses for high-frequency queries, to effectively handle latency and cost implications. This was analyzed with both offline and online evaluations to verify the impact of the changes implemented.
Scaling up presented its own set of challenges, particularly in managing the vast number of queries and the related cost/latency implications. Motivated by noticeable improvements in user experience and performance metrics, we executed a multi-step scaling process. This involved creating and refining a dataset for the fine-tuning of smaller models and ultimately deploying an economically efficient, real-time model to handle less frequent, long-tail queries.”
Download the Slides Watch the VideoAli Rokni
YelpAli is a seasoned Senior Machine Learning Engineer and Tech Lead at Yelp, where he focuses on advancing search quality and learning to rank models. Through his recent work, Ali has refined query understanding and document retrieval processes using Large Language Models (LLMs), resulting in a significant gain in main search quality metrics. Prior to this, Ali contributed to various projects, including addressing foundational ranking problems like sampling and position bias in search data and modernizing the learning to rank model training pipeline.